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The focus of our study was to study the behavior of the smart RED algorithm with parameter adaptation based 
on a neural network structure. Previously Kim et. al. [2] and Basheer et. al. [1], introduced the use of deep 
reinforcement learning for active queue management (AQM). This work studies the performance deep 
reinforcement learning, using a simple topology. Transmitters and receivers communicate and all information 
is routed over a single bottleneck link. This work studies the effect of changing the bottleneck link bandwidth 
and bottleneck latency on this algorithm. It is observed that increasing training data size will increase the 
performance of the algorithm. This paper shows a detailed flowchart of the training process and the specific 
hardware configuration and also demonstrates results. 
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1. Introduction 


The goal of queue management is to selectively drop packets from the queue of packets to be transmitted. Packets are 
dropped to reduce latency for all packets in the network, to reduce the probability of time-out for even more packets 
and to ensure fairness so that a single user does not send so many packets so as to block all other users at intermediate 
nodes in the network (lock-out phenomenon). There have been many proposed Active Queue Management (AQM) 
algorithms such as random early detection (RED), Proportional Integral Controller Enhanced (PIE), etc. 

The RED algorithm was chosen to teach a neural network model to take action. The model updates the REQ parameters 
according to the environment by measuring the reward after each action performed on the algorithm. 


2. Background and Prior Work 


This paper studies a link which bridges multiple communicating nodes on two networks which is overloaded with 
traffic. The goal is to study the reinforced deep learning based Deep Q-Network (DQN) algorithm and to compare it 
to the conventional random early detection (RED) management algorithm. 


A. Active Queue Management Algorithms 


Active queue management (AQM) is used in routers and switches which receive a lot of packets from the upstream 
network in order to minimize delays and prevent the system from locking up. Packets are selectively dropped to keep 
traffic moving into the network. The goal is to reduce local congestion in the network and to improve end-to-end 
latency. Latency is particularly important for real-time algorithms such as voice and video conferencing, and video 
streaming which are increasing as a proportion of internet traffic. Latency is also particularly important for real time 
systems which use data to make decisions in new internet of things applications [1]. 


To regulate and prevent network congestion, it is critical to use resilient active queue management (AQM) algorithms. 
Minsu Kim et. al. [1] have proposed a novel self-learning network management algorithm using deep reinforcement 
learning for use on fog/edge nodes in an Internet of Things (IoT) network. In their work they compare a Deep 
Reinforcement Learning (DRL) based active queue management algorithm called the Deep Q-Network (DQN) 
algorithm [1]. A similar algorithm with some differences in the formulation of the reward function has been 
implemented by Ma et. al. [2]. Szyguta et. al. [3] use a Convolutional Neural Network (CNN) to train the parameters 
of a Proportional Integral (PI*) AQM algorithm. Abbas ez. al. [4] summarize recent work in AQM. AQM algorithms 
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recommended by the IETF for use on the internet include random early detection (RED), Proportional Integral 
Controller Enhanced (PIE), express congestion notification (ECN), and controlled delay (CoDel) [5]. 


To study the DQN algorithm we compare it to an existing algorithm, the Random Early Detection (RED) algorithm 
which is widely used today. 


B. Deep Reinforcement Learning 


This paper studies a new approach to queue management using reinforcement learning. To design a reinforcement 
learning algorithm the system must be described in terms of its state, the actions that can be taken, and rewards. In 
each state, the set of actions an agent can take are ranked according to the rewards that they can return. The system 
updates the expected reward of its actions as it operates. The actions taken are chosen so as to maximize the reward of 
the system but occasionally the agent takes suboptimal action to explore the possibility of getting more reward. 


In deep reinforcement learning the neural network is trained using the reward functions to rank possible actions and 
suggest the best course of action in any state. Deep reinforcement learning has been applied to many problems in 
communications and networking [8,9]. 


The reward in a queueing system will be the throughput (percentage of offered load which is successfully transmitted) 
and the latency of received packets. In general, it is true that as the length of a queue grows, the latency incurred by all 
packets will increase. Dropping packets will decrease the length of the queue but will impact throughput since not all 
packets will be successfully forwarded to the network. These two goals are balanced using a tunable parameter in the 
DQN algorithm. 


3. System Model 
In the network topology under consideration there are multiple nodes on two sides of a bottleneck link. Data is being 
transmitted from nodes Pj, Po, ... Pk to nodes Qi, Q2, ... Qx. There is a low bandwidth bottleneck link between the two 


networks. In simulations as detailed in Table 4 the number of transmitter/receiver pairs is varied from K = 2 to 8. 


The bottleneck link is the one that connects Node N; to Node No. The bandwidth and the link delay on the bottleneck 
link is controlled in simulations. N; is the router where the AQM algorithm is implemented. 


Figure 1. Network structure. 


The algorithm works by balancing two types of utility functions, referred to as rewards: 

e The delay reward increases with decreased delay time. 

e The enqueue reward increases as the rate of packet drops decreases. 
The two rewards must be balanced as when most packets are dropped the delay reward will increase (but many packets 
will be lost) whereas if most packets are kept the enqueuer reward will increase but packets may experience very long 
delays. The balance between the two is controlled using a scale factor parameter 0. 
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where Equation (4) is as used in the Proportional Integral Controller Enhanced (PIE) AQM protocol. The clipping 
function is a nonlinear function which limits the reward to be between -1 and | to enhance the stability of the algorithm. 


Table 1. Equation variables. 


R reward Reward parameter used to tune the queue size 

6 Scale Factor Scales the balance between enqueue reward and delay reward 

Rp Delay Reward increases with decreased delay time of packets 

RE Enqueue Reward increases as the rate of packet drops is reduced. 

dgp | Desired Queue Delay Delay that can be tolerated by user, default value is 0. 
May be used to define different classes of service. 

dgc | Current Queue Delay Delay currently experienced by a user. Estimated using the formula 
used by the Proportional Integral Controller Enhanced (PIE) AQM 
protocol. 

dmin | Minimum Expected Delay | Delay experienced by the packet at the end of the queue if transmit 
channel is continuously in use. 

Le Average Queue Length Average queue length for the user (measured, in units of bytes). 

B Bandwidth in Bytes Total network link bandwidth, available for use. 

Nen | Enqueue Counter Counts number of packets in the queue which are not dropped. 

Narop | Dequeue Counter Counts number of packets in the queue which are dequeued/dropped. 


4. Simulation Parameters and Results 


Table 2 summarizes the hardware and software used in testing. Table 3 shows the NS3 Training Simulation 
Parameters. 


The DQN and RED algorithms were implemented on the simulation platform ns3 [11][12]. The deep learning was 
realized using Tensorflow under Python. Table 4 shows the values of bandwidth and latency that were used for the 
bottleneck link, the number of transmitter and receiver nodes as well as the flow rate for each node. In every case the 
bottleneck link was highly loaded with offered traffic. Through simulations the RED and DQN algorithms' queue 
length, throughput, delay, and packet loss rate data are collected. Simulation time is set to sixty seconds. 


Table 2. Hardware and Software used in testing 


OS Version Ubuntu 20.04.3 64 bit 
CPU Intel Core 17 4500U 
RAM 8GB DDR3 

Storage SSD 128GB 

Python Version Python 3.8.13 
Tensorflow Tensorflow Version 2.4.1 
Python IDE VS Code 1.67.1 

NS3 Version 3.35 
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Table 3. System Parameters 


Value 


Bottleneck Bandwidth 2Mbps to 8Mbps 
Bottleneck Delay 5ms 

Access Bandwidth 1000Mbps 
Access Delay 0.1ms 

Number of sender-sink agents 2 to 8 

Flow Data Rates 1Mbps to 3Mbps 
Simulation Duration 60s 


Table 4. Test Parameters 


Test | Simulation | Bottleneck Link Sender/Sink Apps 
No. | Duration in 
seconds Bandwidth | Delay in | Number Flow Rate 
in Mbps MS in Mbps 
[1 [oo [2 [5 [2 0 fq] 

2 60 2 5 3 1 

3 60 2 5 4 1 

4 60 4 5 3 2 

5 60 4 5 4 2 

6 60 4 i) S 2 

7 60 8 5 6 3 

8 60 8 5 7 3 

9 60 8 5 8 3 


Figure 2 shows the average round trip time (RTT) for the DQN based AQM algorithm compared to the 
RED algorithm for the scenarios listed in Table 4. It is observed that the RTT is almost the same for the 
two algorithms, with the DQN based algorithm having slightly lower RTT in every scenario. The 
improvement in RTT for each scenario is shown in Figure 3. 


Figure 4 shows the values of RTT over the 60 second simulation duration for Scenario 1 in Table 4. It can be seen 
that DQN RTT is lower for specific scenarios. Unfortunately, as with RED the DQN algorithm also seems to lead to 
oscillations in round trip time and this is a scenario for further work. 
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Figure 2. Average RTT values. 


152 


International Joint Conference on Engineering, Science and Artificial Intelligence-IJCESAI 2022 


2,5 
2 
1,5 
1 
: i Bae 
, - 
1 2 3 4 5 6 


Figure 3. Improvement in RTT for DQN over RED. 
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Figure 4. RTT over time for DQN and RED in Scenario 1. 


5. Conclusion 


This paper performed simulations of two active queue management algorithms: RED and DQN-QM. Simulations were 
performed using ns3 and tensorflow. The deep reinforcement learning based AQM was found to outperform RED in 
terms of latency and throughput under many scenarios. 
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